Simultaneous Testing of Grouped Hypotheses: Finding Needles in Multiple Haystacks

نویسندگان

  • T. Tony CAI
  • Wenguang SUN
چکیده

In large-scale multiple testing problems, data are often collected from heterogeneous sources and hypotheses form into groups that exhibit different characteristics. Conventional approaches, including the pooled and separate analyses, fail to efficiently utilize the external grouping information. We develop a compound decision theoretic framework for testing grouped hypotheses and introduce an oracle procedure that minimizes the false nondiscovery rate subject to a constraint on the false discovery rate. It is shown that both the pooled and separate analyses can be uniformly improved by the oracle procedure. We then propose a data-driven procedure that is shown to be asymptotically optimal. Simulation studies show that our procedures enjoy superior performance and yield the most accurate results in comparison with both the pooled and separate procedures. A real-data example with grouped hypotheses is studied in detail using different methods. Both theoretical and numerical results demonstrate that exploiting external information of the sample can greatly improve the efficiency of a multiple testing procedure. The results also provide insights on how the grouping information is incorporated for optimal simultaneous inference.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Guest Editors' Introduction: Information Discovery--Needles and Haystacks

For thousands of years, people have realized the importance of archiving and finding information. With the advent of computers, it became possible to store large amounts of information in electronic form — and finding useful needles in the resulting haystacks has since become one of the most important problems in information management. Many systems exist to help users navigate the considerable...

متن کامل

Finding semantic needles in haystacks of Web text and links∗

Content and links are used to search, rank, cluster and classify Web pages. Here I analyze and visualize similarity relationships in massive Web datasets to identify how content and link analysis should be integrated for relevance approximation. Human-generated metadata from Web directories is used to estimate semantic similarity. Highly heterogeneous topical maps point to a critical dependence...

متن کامل

Bonferroni - based gatekeeping procedure with retesting option

In complex clinical trials, multiple research objectives are often grouped into sets of objectives based on their inherent hierarchical relationships. Consequently, the hypotheses formulated to address these objectives are grouped into ordered families of hypotheses and thus to be tested in a pre-defined sequence. In this paper, we introduce a novel Bonferroni based multiple testing procedure f...

متن کامل

GroupTest: Multiple Testing Procedure for Grouped Hypotheses

In the modern Big Data analysis, testing multiple hypotheses simultaneously has been an important tool in analyzing data arising from scientific studies, such as genetics, astronomy, social sciences and many others. The hypotheses can often be grouped together according to the nature of the scientific investigations. For instance, genes can be grouped according to gene pathways; the nearby pixe...

متن کامل

Finding Needles in Haystacks Is Not Hard with Neutrality

We propose building neutral networks in needle-in-haystack fitness landscapes to assist an evolutionary algorithm to perform search. The experimental results on four different problems show that this approach improves the search success rates in most cases. In situations where neutral networks do not give performance improvement, no impairment occurs either. We also tested a hypothesis proposed...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2009